Rules Frequency Order Stemmer for Malay Language

نویسندگان

  • Muhamad Taufik Abdullah
  • Fatimah Ahmad
  • Ramlan Mahmod
  • Tengku Mohd Tengku Sembok
چکیده

The importance of stemmer is obvious with the advent of effective information retrieval systems. Unfortunately, Malay stemming problems are difficult to solve due to complexity of words morphology. The Rules Application Order (RAO) stemmer is examined for enhancing performance to minimize the percentage of stemming errors. This paper presents a stemming approach called Rules Frequency Order (RFO). RFO rearranges the stemming rules according to the frequency of their usage from the previous execution. It shows that the approach provides a higher percentage of stemming correctness as compared to RAO stemming approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Literature Review and Discussion of Malay Rule - Based Affix Elimination Algorithms

Abstrak Stemming is one of the techniques in natural language processing that is used to reduce a word to its root. Information retrieval and knowledge management can further be improved by improving the stemming process. There are four strategies that are being used widely in stemming that includes table lookup, rule-based affix elimination, successor variety and n-gram. However , not all of t...

متن کامل

Stemming Hausa text: using affix-stripping rules and reference look-up

Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their preprocessing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turki...

متن کامل

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Abstrak The Malay language is an Austronesian language spoken in most countries in the South East Asia region that includes Malaysia, Indonesia, Singapore, Brunei and Thailand. Traditional linguistics is well developed for Malay but there are very limited resources and tools that are available or made accessible for computer linguistic analysis of Malay language. Assigning part of speech (POS) ...

متن کامل

Hand-crafted versus Machine-learned Inflectional Rules: The Euroling-SiteSeeker Stemmer and CST's Lemmatiser

The Euroling stemmer is developed for a commercial web site and intranet search engine called SiteSeeker. SiteSeeker is basically used in the Swedish domain but to some extent also for the English domain. CST’s lemmatiser comes from the Center for Language Technology, University of Copenhagen and was originally developed as a research prototype to create lemmatisation rules from training data. ...

متن کامل

Rule Based Urdu Stemmer

This paper presents Rule based Urdu Stemmer. In this technique rules are applied to remove suffix and prefix from the inflected words. Urdu is well spoken language all over the world but less work has been done on Urdu stemming. Stemmer helps us to find the root of the inflected word. Various possibilities of inflected words like ںو (vao+noon-gunna), ے (badi-ye), ںای (choti-ye+alif+noon-gunna) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009